Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks
نویسندگان
چکیده
In this work we formulate the problem of restless multi-armed bandits with cumulative feedback and partially observable states. We call these bandits as lazy restless bandits (LRB) as they are slow in action and allow multiple system state transitions during every decision interval. Rewards for each action are state dependent. The states of arms are hidden from the decision maker. The goal of the decision maker is to choose one of the M arms, at the beginning of each decision interval, such that long term cumulative reward is maximized. This work is motivated from applications in wireless networks such as relay selection, opportunistic channel access and downlink scheduling under evolving channel conditions. The Whittle index policy for solving LRB problem is analyzed. In course of doing so, various structural properties of the value functions are proved. Further, closed form index expressions are provided for two sets of special cases; for general cases, an algorithm for index computation is provided. A comparative study based on extensive numerical simulations is presented; the performances of Whittle index policy and myopic policy are compared with other policies such as uniform random, non-uniform random and round-robin.
منابع مشابه
Coverage Improvement In Wireless Sensor Networks Based On Fuzzy-Logic And Genetic Algorithm
Wireless sensor networks have been widely considered as one of the most important 21th century technologies and are used in so many applications such as environmental monitoring, security and surveillance. Wireless sensor networks are used when it is not possible or convenient to supply signaling or power supply wires to a wireless sensor node. The wireless sensor node must be battery powered.C...
متن کاملOptimization of Energy Consumption in Image Transmission in Wireless Sensor Networks (WSNs) using a Hybrid Method
In wireless sensor networks (WSNs), sensor nodes have limited resources with regard to computation, storage, communication bandwidth, and the most important of all, energy supply. In addition, in many applications of sensor networks, we need to send images to a sink node. Therefore, we have to use methods for sending images in which the number and volume of packets are optim...
متن کاملMulti - armed restless bandits , index policies , and dynamic priority allocation
This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision process models for optimal dynamic priority allocation to a collection of stochastic binary-action (active/passive) projects evolving over time. Interest in MARBs has grown steadil...
متن کاملPrioritization the Criteria of Wireless Sensor Networks in the Rehabilitation Supervision Using the Fuzzy MCDM Approach
Introduction: The "Wireless Sensor Network" based rehabilitation is one of the major issue in hospitals. The purpose of this study was to prioritization the criteria of wireless sensor networks in the rehabilitation supervision using the Fuzzy MCDM Approach. Methods: In this descriptive study, the population consisted of all doctors and nurses in Tehran Day's Hospital, with 210 people. From the...
متن کاملRestless Bandits with Constrained Arms: Applications in Social and Information Networks
We study a problem of information gathering in a social network with dynamically available sources and time varying quality of information. We formulate this problem as a restless multi-armed bandit (RMAB). In this problem, information quality of a source corresponds to the state of an arm in RMAB. The decision making agent does not know the quality of information from sources a priori. But the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.01301 شماره
صفحات -
تاریخ انتشار 2018